Skip to content

Conversation

@jmarble
Copy link

@jmarble jmarble commented Nov 18, 2025

Summary

Fixes #20262 by making SORT_REGULAR fall back to a fully transitive comparator whenever loose comparison semantics would otherwise be non-transitive (numeric strings vs ints/floats, enums, nested arrays/objects). This keeps duplicates grouped so array_unique() and the sort family behave consistently.

Highlights

  • Introduce php_array_compare_transitive()/php_array_compare_transitive_objects() and wire all SORT_REGULAR compare dispatchers (php_get_*_compare_func{,_reverse,_unstable}) to the generated *_regular comparators, keeping array_unique()/diff/intersect on the same transitive path.
  • Make enum SORT_REGULAR ordering deterministic: backed enums sort by backing value, unit enums by case name with object handle as tie-breaker.
  • Add regression tests for array_unique() (scalars, objects, nested arrays), sort()/ksort() numeric-string edge cases, and enum ordering stability.

Performance Impact

Status: Unoptimized Implementation

This initial implementation prioritizes correctness and transitivity to fix the underlying stability issues. It is not yet optimized, relying on a generalized dispatch mechanism to ensure the logic holds up under review.

As a result, there are known regressions in specific comparison operations due to the overhead of the new dispatch logic:

  • Mixed-Type Comparisons: ~7–10% regression (cost of enforcing transitivity).
  • Standard Key Sorts (ksort/krsort): ~10% regression (general overhead).

However, even in this unoptimized state, the new architecture yields significant wins in common scenarios:

  • Integer Sorts: ~16% faster.
  • Associative Arrays (Mixed Alphanumeric Keys): >30% faster.

Roadmap:
Once the transitive comparison logic is approved, I will submit a follow-up PR with finely tuned optimizations. These changes are expected to eliminate the current regressions and dramatically improve performance across all remaining operations.

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various comments and questions and this needs a rebase as I refactored the sorting code to remove a bunch of duplication.

Comment on lines +460 to +406
static int php_array_hash_compare_transitive(zval *zv1, zval *zv2) /* {{{ */
{
return php_array_compare_transitive(zv1, zv2);
}
/* }}} */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept this one so we can pass a compare_func_t to zend_hash_compare().
php_array_compare_transitive() doesn’t match that signature, so we still need this tiny adapter.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My previous comment is no longer valid, this can be removed, but I noticed a measurable regression in my benchmarks after removing it, so I decided to keep it in place. I should probably include a comment in the function regarding it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you benchmarking? Because I don't really see why it would regress?

Copy link
Author

@jmarble jmarble Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I built a benchmark that evaluates all comparison ops, sort family, and all the array functions that use SORT_REGULAR -- 142 ops in total. I have a dedicated FreeBSD server in my office that I use for benchmarking. Wall-clock CV stays below 0.1%, so the deltas are solid.

When I remove the wrapper and pass php_array_compare_transitive() directly to zend_hash_compare(), 66/142 ops get slower (Time-Weighted ΔMedian%: 0.03%).

With the wrapper in place I see 60/142 ops get slower (Time-Weighted ΔMedian%: -1.20%).

My best guess is that the wrapper keeps php_array_compare_transitive() inlinable at its other call sites; once its address escapes through zend_hash_compare() the compiler stops inlining it into the rest of array.c.

Is there another way to keep those call sites inlined while still satisfying the desire to avoid the wrapper?

}
/* }}} */

static int php_array_compare_transitive(zval *op1, zval *op2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand why you need this? I thought the transitivity issue was only about numeric strings and numeric values?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept php_array_compare_transitive() because once we decide an array needs the "regular" fallback (e.g. it mixes numeric strings and numbers, or it contains arrays/objects whose elements do), we have to apply that stricter comparison recursively. zend_compare() would hit the same non‑transitive behavior when it dives into nested arrays or object properties, so this helper still wraps the recursive walk and reuses the numeric‑string handling (plus the enum ordering) at each level.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... this is annoying to say the least. Please expand on the comment to explain that what you are overriding is enum, array, and object comparison, you probably should also add a comment near zend_compare to apply changes back to here.

Or maybe just check that the values are enums, arrays, or objects and then defer to zend_compare() for everything else to prevent duplication.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the full php_array_compare_transitive() matrix because the non-transitivity isn't limited to enums/arrays/objects. The original bug repros with scalars, so if we just short-circuited to zend_compare() for "everything else," we'd immediately fall back into the non-transitive ordering we're trying to fix. That's why the helper mirrors zend_compare() and overrides the handful of cases that can become non-transitive.

I can definitely expand the comment to spell that out and add a "keep in sync with zend_compare()" note near the helper, but we can't simply defer to zend_compare() for the scalar cases without reintroducing the bug.

@jmarble
Copy link
Author

jmarble commented Nov 18, 2025

@Girgias thank you for taking the time to provide the careful review! Looks like I was able to capture your sorting code refactor when I created this new branch. I'll push a fresh commit what I addressed in your code comments. Thanks again for the help!

Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please only do the fix for the transitivity.

Optimizations can be decided later, but currently it just pollutes the PR and makes it harder to review and merge.

@jmarble jmarble marked this pull request as draft November 21, 2025 16:15
@jmarble jmarble force-pushed the fix-php-array-sort-regular branch from 374a660 to 2ff1700 Compare November 21, 2025 22:03
@jmarble
Copy link
Author

jmarble commented Nov 21, 2025

@Girgias yes, I clearly got a bit carried away haha. I decided to reimplement and force push a clean commit. Sorry for the mess I made of this PR.

I have a bag full of optimizations we can save for a follow-up PR. One worth calling out would be to split zendi_smart_strcmp() so the transitive comparator doesn’t need to re-run the non-transitive fast paths. I also found an opportunity to add a single-bucket fast path in zend_compare_symbol_tables() which showed close to 1.25x speedup on array comparison.

@jmarble jmarble marked this pull request as ready for review November 21, 2025 22:48
Comment on lines +460 to +406
static int php_array_hash_compare_transitive(zval *zv1, zval *zv2) /* {{{ */
{
return php_array_compare_transitive(zv1, zv2);
}
/* }}} */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you benchmarking? Because I don't really see why it would regress?

Comment on lines +439 to +434
/* Mirrors zend_std_compare_objects(), but recurses via php_array_compare_transitive()
* so nested properties obey SORT_REGULAR's transitive ordering. */
static int php_array_compare_transitive_objects(zval *o1, zval *o2) /* {{{ */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I think might make more sense is to create a zend_std_compare_objects_ex() function that takes a function pointer for the prop table comparison if this is identical.

As hopefully the compiler will inline the behaviour properly in zend_std_compare_objects() so that it should be equivalent. As for quite a bit I was trying to understand what the point of this is.

Copy link
Author

@jmarble jmarble Nov 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just gave it a try, and benchmarked it. I saw a small, almost negligable, regression. I see Time-Weighted ΔMedian% increased ~0.9% (from -1.20% to -0.31%).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a different idea. I created a zend_object_compare_kind enum in zend_object_handlers.h and added zend_std_compare_objects_ex(), so the standard object comparator can flip between zend_compare() and a transitive variant (zend_compare_transitive() without going through a function-pointer callback.

To make that transitive mode reusable everywhere, I moved the SORT_REGULAR compare logic into Zend itself (zend_compare_transitive(), plus zend_compare_symbol_tables_transitive() and the enum-aware helpers).

This design showed a negligible difference (within measurement noise) in my benchmarks compared to the current implementation.

I'm happy to push another commit with this change if you'd like to see.

}
/* }}} */

static int php_array_compare_transitive(zval *op1, zval *op2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right... this is annoying to say the least. Please expand on the comment to explain that what you are overriding is enum, array, and object comparison, you probably should also add a comment near zend_compare to apply changes back to here.

Or maybe just check that the values are enums, arrays, or objects and then defer to zend_compare() for everything else to prevent duplication.

@jmarble jmarble marked this pull request as draft November 23, 2025 21:51
- Add zend_compare_{long,double}_to_string_ex() plus
  zendi_smart_strcmp_ex() so SORT_REGULAR can invoke transitive-aware
  scalar comparisons without touching zend_compare()
- Introduce php_array_compare_transitive() (pared-down zend_compare())
  and php_array_compare_transitive_objects() (mirrors
  zend_std_compare_objects()) so arrays, objects, and enums recurse with
  transitive ordering
- Route the public sort APIs and array_unique() through
  php_array_sort_regular() so PHP_SORT_REGULAR always uses the new
  comparator
- Add regression tests: phpGH-20262 (array_unique with enums/objects/nested
  arrays) plus SORT_REGULAR consistency tests for sort()/ksort() on
  numeric-string edge cases

Fixes: phpGH-20262
- Make every php_get_*_compare_func{,_reverse,_unstable} return the
  *_regular variants so the public sort APIs no longer need
  php_array_sort_regular()
- Drop php_array_sort_regular() and the old key/data compare impl
  helpers now that their logic lives in the generated *_regular
  comparators
- Have array_unique() fetch its unstable comparator exclusively through
  php_get_data_compare_func_unstable(), matching the rest of the sort
  entry points
- Compare backed enums via their stored backing values so SORT_REGULAR’s
  common path no longer fetches and compares case names; unit enums
  still fall back to case-name ordering, with object handles as the
  deterministic tie-breaker
- Add ext/standard/tests/array/sort/sort_enum_stability.phpt to ensure
  both unit and backed enums produce the same sorted order regardless of
  access order
@jmarble jmarble force-pushed the fix-php-array-sort-regular branch from 2ff1700 to 9026917 Compare November 24, 2025 06:11
- Remove DEFINE_SORT_VARIANTS_USING macro layer
- Inline the implementation directly in DEFINE_SORT_VARIANTS
- Move enum helper functions after DEFINE_SORT_VARIANTS usage
…helper

- Replace php_array_apply_sort with php_sort that handles parameter parsing
- Consolidate duplicate parameter parsing code across asort, arsort, sort, rsort, krsort, and ksort
- Each sort function now simply calls php_sort with appropriate compare function and renumber flag
@jmarble jmarble marked this pull request as ready for review November 25, 2025 01:52
Apply IEEE 754 totalOrder predicate for NaN handling in transitive
SORT_REGULAR comparisons. This provides a consistent, deterministic
ordering where NaN values sort after +INF but before non-numeric
strings:

  -INF < finite numbers < +INF < NaN < non-numeric strings
@jmarble jmarble marked this pull request as draft November 25, 2025 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

array_unique() with SORT_REGULAR returns duplicate values

3 participants